7 research outputs found

    Prophet Attention: Predicting Attention with Future Attention for Image Captioning

    Full text link
    Recently, attention based models have been used extensively in many sequence-to-sequence learning systems. Especially for image captioning, the attention based models are expected to ground correct image regions with proper generated words. However, for each time step in the decoding process, the attention based models usually use the hidden state of the current input to attend to the image regions. Under this setting, these attention models have a "deviated focus" problem that they calculate the attention weights based on previous words instead of the one to be generated, impairing the performance of both grounding and captioning. In this paper, we propose the Prophet Attention, similar to the form of self-supervision. In the training stage, this module utilizes the future information to calculate the "ideal" attention weights towards image regions. These calculated "ideal" weights are further used to regularize the "deviated" attention. In this manner, image regions are grounded with the correct words. The proposed Prophet Attention can be easily incorporated into existing image captioning models to improve their performance of both grounding and captioning. The experiments on the Flickr30k Entities and the MSCOCO datasets show that the proposed Prophet Attention consistently outperforms baselines in both automatic metrics and human evaluations. It is worth noticing that we set new state-of-the-arts on the two benchmark datasets and achieve the 1st place on the leaderboard of the online MSCOCO benchmark in terms of the default ranking score, i.e., CIDEr-c40.Comment: Accepted by NeurIPS 202

    Type-IV DCT, DST, and MDCT algorithms with reduced numbers of arithmetic operations

    Full text link
    We present algorithms for the type-IV discrete cosine transform (DCT-IV) and discrete sine transform (DST-IV), as well as for the modified discrete cosine transform (MDCT) and its inverse, that achieve a lower count of real multiplications and additions than previously published algorithms, without sacrificing numerical accuracy. Asymptotically, the operation count is reduced from ~2NlogN to ~(17/9)NlogN for a power-of-two transform size N, and the exact count is strictly lowered for all N > 4. These results are derived by considering the DCT to be a special case of a DFT of length 8N, with certain symmetries, and then pruning redundant operations from a recent improved fast Fourier transform algorithm (based on a recursive rescaling of the conjugate-pair split radix algorithm). The improved algorithms for DST-IV and MDCT follow immediately from the improved count for the DCT-IV.Comment: 11 page

    Qwen Technical Report

    Full text link
    Large language models (LLMs) have revolutionized the field of artificial intelligence, enabling natural language processing tasks that were previously thought to be exclusive to humans. In this work, we introduce Qwen, the first installment of our large language model series. Qwen is a comprehensive language model series that encompasses distinct models with varying parameter counts. It includes Qwen, the base pretrained language models, and Qwen-Chat, the chat models finetuned with human alignment techniques. The base language models consistently demonstrate superior performance across a multitude of downstream tasks, and the chat models, particularly those trained using Reinforcement Learning from Human Feedback (RLHF), are highly competitive. The chat models possess advanced tool-use and planning capabilities for creating agent applications, showcasing impressive performance even when compared to bigger models on complex tasks like utilizing a code interpreter. Furthermore, we have developed coding-specialized models, Code-Qwen and Code-Qwen-Chat, as well as mathematics-focused models, Math-Qwen-Chat, which are built upon base language models. These models demonstrate significantly improved performance in comparison with open-source models, and slightly fall behind the proprietary models.Comment: 59 pages, 5 figure

    Methods on COVID-19 epidemic curve estimation during emergency based on Baidu search engine and ILI traditional surveillance in Beijing, China

    No full text
    Surveillance is an essential work on infectious diseases prevention and control. When the pandemic occurred, the inadequacy of traditional surveillance was exposed, but it also provided a valuable opportunity to explore new surveillance methods. This study aimed to estimate the transmission dynamics and epidemic curve of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) Omicron BF.7 in Beijing under the emergent situation using Baidu index and influenza-like illness (ILI) surveillance. A novel hybrid model (multiattention bidirectional gated recurrent unit (MABG)–susceptible–exposed–infected–removed (SEIR)) was developed, which leveraged a deep learning algorithm (MABG) to scrutinize the past records of ILI occurrences and the Baidu index of diverse symptoms such as fever, pyrexia, cough, sore throat, anti-fever medicine, and runny nose. By considering the current Baidu index and the correlation between ILI cases and coronavirus disease 2019 (COVID-19) cases, a transmission dynamics model (SEIR) was formulated to estimate the transmission dynamics and epidemic curve of SARS-CoV-2. During the COVID-19 pandemic, when conventional surveillance measures have been suspended temporarily, cases of ILI can serve as a useful indicator for estimating the epidemiological trends of COVID-19. In the specific case of Beijing, it has been ascertained that cumulative infection attack rate surpass 80.25% (95% confidence interval (95% CI): 77.51%–82.99%) since December 17, 2022, with the apex of the outbreak projected to transpire on December 12. The culmination of existing patients is expected to occur three days subsequent to this peak. Effective reproduction number (Rt) represents the average number of secondary infections generated from a single infected individual at a specific point in time during an epidemic, remained below 1 since December 17, 2022. The traditional disease surveillance systems should be complemented with information from modern surveillance data such as online data sources with advanced technical support. Modern surveillance channels should be used primarily in emerging infectious and disease outbreaks. Syndrome surveillance on COVID-19 should be established to following on the epidemic, clinical severity, and medical source demand
    corecore